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Abstract —We propose and implement a novel relative positioning 
system, WalkieLokie, to enable more kinds of Augmented Reality ap¬ 
plications, e.g., virtual shopping guide, virtual business card sharing. 
WalkieLokie calculates the distance and direction between an inquiring 
user and the corresponding target. It only requires a dummy speaker 
binding to the target and broadcasting inaudible acoustic signals. Then 
the user walking around can obtain the position using a smart device. 
The key insight is that when a user walks, the distance between the 
smart device and the speaker changes; and the pattern of displacement 
(variance of distance) corresponds to the relative position. We use a 
second-order phase locked loop to track the displacement and further 
estimate the position. To enhance the accuracy and robustness of our 
strategy, we propose a synchronization mechanism to synthesize all 
estimation results from different timeslots. We show that the mean 
error of ranging and direction estimation is 0.63m and 2.46 degrees 
respectively, which is accurate even in case of virtual business card 
sharing. Furthermore, in the shopping mall where the environment is 
quite severe, we still achieve high accuracy of positioning one dummy 
speaker, and the mean position error is 1.28m. 


1 Introduction 

With the rapid development of smart phones and wear¬ 
able devices, attractive Augmented Reality (AR) apps 
have been developed, e.g., Sky Map, Wikitude, Aug¬ 
mented Car Finder. One of AR's key features is to 
display useful information about a person's surround¬ 
ings, which relies on location information. For example, 
Wikitude uses GPS and inertial sensors to provide inter¬ 
active information about objects that are seen through 
the camera of smart devices. 

In this paper, we explore localization techniques to en¬ 
able more kinds of AR applications on smart devices. For 
instance, a person walks in a large shopping mall and 
a virtual shopping guide recommends the surrounding 
goods that are new arrivals or on sale; or shares her/his 
virtual business card with people walking around in 
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a party. Such applications require the knowledge of 
relative position between targets {e.g., goods, person) and 
inquiring users. 

However, current localization systems cannot be di¬ 
rectly applied to relative positioning satisfactorily due to 
various limitations. For instance, these systems can only 
be used in some places with specified infrastructure 
being deployed, or require feature-rich hardware serving 
as target. More specifically, GPS can calculate the location 
of outdoor users, but is unavailable in indoor environ¬ 
ments. Pure WiFi-based indoor localization can achieve 
3^4 meter accuracy in absolute positioning and there 
always exists large errors {e.g., 6^8m) [6]. So the errors 
are much greater than 4 meters when inferring relative 
position from absolute positions of the smartphone and the 
target. Other indoor localization schemes [7], [19] are 
accurate enough {e.g., sub-meter accuracy), but require 
special-purpose infrastructure or hardware. There are 
schemes calculating relative direction (e.g., Swadloon 
[3]) and distance (e.g., BeepBeep [12]). However, Swad¬ 
loon requires unusual behavior of querying user {i.e., 
phone-shaking movement) before getting the direction 
of a target. More importantly, Swadloon cannot obtain 
distance from the target. Though BeepBeep can be added 
for calculating distance, it requires that the target has rich 
functions, such as broadcasting and receiving acoustic 
signals, communication for exchanging data and compu¬ 
tation for processing data. It is feasible when the target 
uses a smartphone which has all these functions, but 
other applications, such as shopping guide, may prefer 
a cheaper target device with much fewer functions. 

We propose and implement WalkieLokie, which cal¬ 
culates the relative position from a user with a smart 
device to a target for Augmented Reality. WalkieLokie 
does not need any infrastructure being deployed, and 
the application of WalkieLokie is not limited by places. 
The only requirement of WalkieLokie is that the target 
is attached with a dummy speaker for broadcasting 
audio, which can be received by the smart device and 
then processed to directly infer the relative position. 
The dummy speaker merely broadcast audio without 
requirement of any other features, e.g., audio recording, 
communication or computation. Hence, they are widely 
used and some of them are cheap and simple, such 
as speaker embedded in user's smart devices, or even 
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loudspeaker originally for sales promotion in a shopping 
mall. Moreover, the broadcast audio is inaudible that the 
loudspeaker, which used to be a noisy tool for sales 
promotion, can now be ''silent" for the same job by 
"broadcasting" its relative position. 

Our work is based on the observation that when a 
user walks, the distance between the object and the user 
changes; and the pattern of displacement (variance of 
distance) relates to the relative position. In other words, 
by letting a device receive and analyze the signal (audio 
signal mixed with non-audible signal) broadcast by a 
target (speaker), we are able to track the displacement 
and further compute the relative position accurately 
and efficiently, i.e., finishing both ranging and direction 
estimation at the same time. 

However, we have to solve a number of issues in 
our scheme. First, since the displacement is relatively 
small, the practical challenge is how to obtain the po¬ 
sition precisely when a user walks for only very few 
steps. Second, when the user is far from the speaker, 
the measured displacement is prone to be influenced 
by noises and the difference of displacement becomes 
more indistinguishable such that even a tiny error in 
the measurement could cause large errors in positioning. 
Hence, both accurate displacement measurement scheme 
and robust positioning strategy are needed. 

In order to get an accurate displacement measurement 
scheme, we track the phase of the signal (corresponding 
to the displacement) utilizing the second-order Phase 
Locked Loop (PLL), which could avoid jitters and has 
high accuracy when the signal is weak. Hence, the dis¬ 
tance and direction could be computed accurately when 
a user is close to a speaker. Next, as we have mentioned 
above, the estimated position may have a bigger error 
when a user is far from a speaker. In this case, we adopt 
two strategies to further improve the accuracy and ro¬ 
bustness. One is to utilize the measurement results when 
the user is close to the speaker if available. Otherwise, we 
synthesize all the estimations (longer path a user passed) 
by the synchronizing scheme. The main idea is based 
on the following observation: Since the distance can be 
obtained according to the difference between the sent 
time and received time of a signal, where the receiving 
time is directly computed but the sent time is unknown 
for the receiver, we add a periodical pulse into the audio 
to get the sent time in a novel way. Specifically, when a 
good estimation is obtained, the distance along with the 
sent time of the pulse is calculated. Hence, the sent time 
of the later periodical pulses is predicted, which can infer 
the distance according to sent time and receiving time of 
the pulses. 

WalkieLokie also addresses a number of practical is¬ 
sues based on the main solution: 

The user frequently turns the walking direction: we 

provide enhanced algorithm that gathers all the pieces 
of small linear segments at different directions to obtain 
the position. 

Multipath effects on pulse detection: WalkieLokie de¬ 


tects arrival time of all the pulses, including the pulse 
directly from sender and also reflected ones. Then it 
eliminates the false pulses by leveraging the result of 
PLL. 

Non-Line-of-Sight (NLoS): WalkieLokie uses historical 
position results, and infers the current position by addi¬ 
tionally using inertial sensors; once the smart device is 
within the coverage of the signal, WalkieLokie updates 
accurate position by synchronization. 

Device diversity: The main problem is serious clock 
drift of normal dummy speaker, otherwise the receiver 
obtains wrong receiving time of periodical pulses and 
further wrong distance in synchronization. We leverage 
the result of PLL, and calibrate the clock precisely in case 
that the receiver is static for only a few seconds. 

Device Orientation: Different orientation of the speaker 
or receiver affects the quality of the received signal. We 
find that the quality mainly affects the result of PLL and 
further the displacement. More specifically, when the 
signal quality is poor for certain orientation, the tracked 
displacement becomes smaller than the real displace¬ 
ment. To enhance the accuracy, we make calibrations on 
tracked displacement based on our measurements. 
Conflicts of multiple signals: In WalkieLokie, the pe¬ 
riodical pulses in synchronization possess bandwidth, 
which limits the number of co-existing signals. We care¬ 
fully design the pulse that possesses narrow bandwidth, 
and also show the way of supporting more number of 
co-existing speakers. 

Noisy environment: WalkieLokie uses Band Pass Filter 
to eliminate the noises and it works well in the noisy 
shopping mall. 

We implement WalkieLokie and evaluate the perfor¬ 
mance of all the components separately with several 
types of cases and then the performance of WalkieLokie. 

a) . For the case when a user is within 8 meters away 
from a speaker, the mean error of ranging and direction 
estimation is 0.63m and 2.45^^. It shows considerable 
accuracy when a user shares virtual business card with 
surroundings. 

b) . When the user is within 20m and uses synchroniza¬ 
tion for positioning, the ranging and direction estimation 
errors are less than 0.32m and 2.81^^ at the percentage of 
80% respectively. Note that the results in this case only 
infer the accuracy of the subcomponent (synchroniza¬ 
tion), instead of the total accuracy of WalkieLokie. 

c) . We combine all the work together and evaluate 
WalkieLokie in a severe environment, i.e., the noisy 
shopping mall. We conduct the experiment in 2 cases: 

• cl). Relative positioning of one speaker; 

• c2). Absolute positioning using multiple speakers {i.e., 
ordinary indoor localization). 

We put 5 dummy speakers in a 600m^ area, and the 
positions of speakers are limited to be deployed, (just at 
the side of aisle, instead of the position on the ceiling). 
Even in this case, by using only one of these speakers, 
WalkieLokie also achieves the mean error of 1.28m for 
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the relative position. Hence, a user knows the accurate 
relative position of a virtual shopping guide attached 
with a dummy speaker. 

To explore the possibility of absolute positioning, we 
also use all the 5 speakers as anchors for localization 
and the mean error is 0.89m, where each position is 
covered by the signal of less than 2 speakers on average. 
Since normal anchor-based systems only obtain distance 
or direction but cannot calculate both metrics, they re¬ 
quire at least 3 anchors for trilateration. Hence it shows 
another advantage of WalkieLokie that it is robust when 
the anchors are sparse in deployment. 

The rest of the paper is organized as follows. We first 
present the overview of WalkieLokie in Section 2 . We 
propose the position estimation based on displacement 
tracking in Section 3 and displacement tracking method 
in Section 4. We give the details of the synchronization in 
Section 5. We report our extensive experimental results 
in Section 6 . We review some related work in Section 7. 
We conclude the paper in Section 8 . 



(b) Architecture of WalkieLokie. 


2 Overview 

2.1 Problem Description 

WalkieLokie calculates relative position between a user 
with a smart device and a target attached with a dummy 
speaker, where the relative position can decompose into 
distance and direction from the smart device to the 
dummy speaker. The dummy speaker merely broadcasts 
inaudible audio without the requirement of any other 
features. The smart device has a microphone and inertial 
sensors (z.c., compass, accelerometer, gyroscope), which 
are common components in almost all smart devices. 

2.2 Intuitive Soiution 

The key insight of our paper is that when a user walks 
along a line, the pattern of displacements from the 
user to a dummy speaker is related to relative position 
directly. 

We illustrate the intuitive solution on a simple case 
in Figure la, where a user walks and steps at Oi, O 2 / 
and O 3 . Suppose the displacements di{=li — I 2 ) and 
d 2 {=h—h) are measured beforehand and the user's stride 
length (IO 1 O 2 I) is given. Intuitively, di ^ 0 infers that Oi 
and O 2 are close to H where AH_LOi02 and d2 < 0 , 
which tells us that the speaker is at the back of the 
walking user, hence infers the coarse-grained direction. 
Another observation is that when the distance \AH\ 
increases, the value of |d 2 —| decreases, which infers the 
coarse-grained distance as well. So, the relative position 
between Oi and A can be estimated. 

2.3 Main Technical Issues 

From the above example, the following main technical 
issues required to be solved: 

Formal solution of relative positioning (Section 3). 

Given the real-time relative displacement, we need to 


Fig. 1: Example of relations between displacement and 
relative positions, and architecture of WalkieLokie. 

calculate the precise relative position, instead of coarse¬ 
grained one. 

Tracking relative displacement (Section 4). The rela¬ 
tive displacement needs to be tracked before relative 
positioning. Note that to the best of our knowledge, 
current approaches cannot directly obtain distances /i 
without synchronization between the receiver and the 
dummy speaker. They require additional capabilities 
of the speaker, such as communication capability that 
exchanges synchronization information [12]. Instead, we 
calculate the displacement: di(= h — I2) by PLL. 
Extended solution when distance is longer (Section 
5. When \AH\ becomes much longer, \d 2 — di\ is much 
smaller. Since there are errors on tracking the displace¬ 
ment di,d 2 , the ideal case is that small change of distance 
corresponds to large value of \d 2 — di\, which results 
in high accuracy of calculated distance. However, when 
\AH\ becomes much larger, it is the opposite case that 
tiny error on measuring di or ^2 will result in large error 
on calculating \AH\. Hence, the accuracy of ranging de¬ 
clines when \AH\ becomes larger, and we need extended 
solution in this case. Note that the accuracy of direction 
finding is not much affected that we mainly propose the 
extended solution for ranging. 

2.4 Architecture 

To solve the technical issues, we divide WalkieLokie into 
3 main components in Figure lb: input of smart device, 
acoustic processing, and positioning scheme. 

Input: The microphone and inertial sensors are used in 
WalkieLokie. The microphone records audio for acoustic 
processing. The inertial sensors mainly serve as a step 
counter, which records the time when the user steps on 
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the ground. When the user turns direction, the angle of 
user's rotation is also calculated by the gyroscope. 
Acoustic processing: This component generates interme¬ 
diate results preparing for the positioning scheme. 

One result is relative displacement, which is tracked by 
analyzing the recorded audio (in Section 4). The audio 
firstly passes through the Band Pass Filter (BPF) that 
the signal at the specified frequency passes and other 
signals including human voice and noises are eliminated. 
Then, the filtered signal is processed by Automatic Gain 
Control (AGC), and then the amplitude of the signal is 
close to constant. The signal then passes through our 
carefully-designed Phase Locked Loop (PLL), and the 
phase of the signal, which is proportional to relative 
displacement, is tracked. 

Another intermediate result provides additional infor¬ 
mation for the extended solution. More specifically, we 
encode periodical pulses in the sent signal, and the smart 
device detects the corresponding pulses to determine the 
receiving time of the pulses (in Section 5). The problem 
is that the pulse should take very little bandwidth, 
otherwise the number of concurrent speakers is much 
limited. We carefully encode the signal to solve this 
problem, and design the pulse detection algorithm to 
precisely determine receiving time of the pulses. Note 
that the pulse detection analyzes tracked phase from 
output of PLL, for we directly modulate the phase, rather 
than the raw audio, to encode the pulses in order to save 
bandwidth. 

Positioning scheme: The scheme calculates position by 
receiving the intermediate results. The scheme firstly 
estimates position by using the relative displacements 
and user's step time. It leverages the intuitive solution 
in Section 2.2, which is formally illustrated in Section 
3. Then, if the computed distance is very short (< 8m), 
the calculated position is accurate enough and accepted 
as valid result. Otherwise, the calculated direction is 
accurate, but the calculated distance is inaccurate. In this 
case, the scheme invokes synchronization in Section 5 
to compute the relative position. The synchronization 
uses the historical results of relative position to infer the 
distance. By additionally using the historical receiving 
time of the pulses, the sending time of the periodical 
pulses is then calculated. The accurate distance is then 
inferred from the detected receiving time and the pre¬ 
dicted sending time of the current pulse. 

3 Estimation of Relative Position 

In this section, we propose the method on distance and 
direction estimation from smart device to speaker, i.e., 
the relative position. The intuition is that when the user 
walks, there is a unique pattern of displacement accord¬ 
ing to relative position. Hence, we use the displacements 
(calculated in section 4) to deduce user's positions. 

3.1 Positioning When User Walks along a Line 

To estimate the distance, we first consider a simple 
scenario when a user starts walking from Oi and steps at 


A 



Fig. 2: Positioning when the user walks along a line. 

O 2 , O3, , On, shown in Figure 2 . A is the position of 
the speaker. Denote the height of the speaker relative 
to the smart device as h = \AG\. Assume the stride 
length is close to constant s = OiOi^i. Both h and s are 
assumed to be given and used for distance and direction 
estimation. The other inputs are the displacements of all 
the steps, i.e., di = k — for the step OiOi+i, which 
are calculated using PLL. Observe that the distance from 
the speaker to OiOi+i is constant y = \AH\, where 
AH T OiOn- Hence, we first estimate x = \HOi\ and 
y from those inputs and then estimate the position at 
each step point Oi according to x and y. 

Intuitively, x and y can be found by traversing the 
positions and using maximum likelihood estimation. 
Specifically, as \HOi\ = x {i — l)s, i = 1,2,3,..., 
denoting that 

I'i = + {x + {i- l)s)2 (1) 

ei = I'i- I'i+i - di (2) 

Then = 0 if di is accurate. Hence, for n displace¬ 
ments di, (i 2 ,..., dn, X and y can be solved from above 
n equations by {x,y) = arg min ef . Here we use 

x,y 

the Newton's Method [9] to reduce the computation 
overhead. 

Observe that Li = \GOi\ and = ZGOiOn, instead of 
li and f!i, are the horizontal distance and direction and 
used for positioning when x and y are estimated, we 
make the distance and direction results in the following 
equations: 

Li = 


3.2 Synthesizing When User Turns Direction 

When a user turns direction while walking, we can 
always calculate the relative position as follows. Assume 
that the user starts from Oa and walks along the linear 
segment OaOb, O^Oc, OcOd, OdOe in Figure 3. We use 
the calculated displacements in this case. We also use 
the step counter to estimate the linear length riaS, n^s, 
ricS, where s is the stride length and ria is the number 
of steps when user walks from Oa to Oh- 
Calculating angle of turning direction: We calculate 
angle of user's rotation mainly by using gyroscope, 
when the user turns direction. Though Zee [15] can 
directly calculate walking direction, it is mainly affected 
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4 Tracking Displacement 

In this section, we show how we design the acoustic 
wave emitted by the dummy speaker and infer the 
displacement from the acoustic wave when the user 
walks. 


4.1 Brief Design of the Acoustic Wave 


Fig. 3: Positioning when the user walks and turns. 

by inaccurate compass and usually cannot distinguish 
whether the user is walking forward or backward along 
a direction. WalkieLokie does not require the knowledge 
of absolute direction of user's walking for we only need 
to know relative position from the user to the target. 
It only requires the angle of user's rotation when the 
user turns, which is used for calculating position in this 
section. For instance, assume the initial walking direction 
is Ca and the following direction is Cb- We do not calculate 
(a or (i) by magnetic sensor, but directly calculate the 
difference of walking direction, z.c., — (a/ from the 

gyroscope. The purpose is to avoid errors caused by 
magnetic sensor of the smart device, where the errors 
might be huge in indoor environments. Note that (a 
can be eliminated as we will convert the position in 
WCS (World Coordinate System) into the one in RCS 
(relative coordinate system), as mentioned in section 6 . 
Hence, our problem is much easier that the gyroscope 
can accurately calculate the angle of user's rotation. By 
using this rotation angle together with the step detection, 
we can get the walking trace without knowing the 
relative position. Furthermore, when the relative position 
is obtained by additionally using the acoustic signal, the 
coarse-grained position can be obtained according to the 
same technique by only using the inertial sensors, if there 
is no signal received (z.c., NLoS effects). 

Calculating position: Now we calculate relative position 
G{gx,gy), which is the projection of acoustic speaker A 
at a horizontal plane and is at the same height with 
the receiver. Denote Oa is at ( 0 , 0 ), we estimate the next 
positions Oc^Od - • • from the step counter and gyroscope. 
For example, Oc is at the position (c^,, Cy) = {uaS cos(Ca) + 
ni)Scos{(i)),naSsm{(a) + sin(C 5 )), and so forth. Given 

the calculated displacement dci,dc 2 , • • •,similar to 
Eq. (1), ( 2 ), the distance from each stride point to G is 


The modulated wave in audio is used for two purposes: 
displacement tracking and synchronization. Hence, the 
wave s{t) contains two parts respectively. More specif¬ 
ically, we formally define the wave in the following 
equations, 

si{t) kT 2 <t < kT 2 + Ti 
52 (t) kT2 + Ti < t < (A: + 1)T2 

where T 2 = 0.255 is the cycle of the wave and k is the 
natural number. Ti = O.I 65 is the duration of 5i(t) in 
each cycle. 

We mainly use 5 i(t) for tracking the displacement. 
Intuitively, 5i(t) is a sine wave, and phase of the cor¬ 
responding received signal ri(t) is changed when the 
distance changes. We prove in the following subsection 
that the phase of ri (t) is proportional to relative displace¬ 
ment. So by tracking the phase of ri(t), the displacement 
is tracked. The displacement tracking algorithm (PLL) 
will be detailed in this section. The relation between 
phase and displacement is illustrated in section 4.2 and 
we explain how to preprocess signal and track phase in 
section 4.3 and 4.4 respectively. 

Note that 52 (t) is not only used for synchronization, 
but also capable of displacement tracking, like si(i). 
As a result, the measurement of displacement is rarely 
affected by additional function of synchronization. In 
section 5, we will discuss the use of 52 (t). 

4.2 Phase & Displacement 

In order to track displacement, we define 5 i(t) as fol¬ 
lows: 

5i(t) = cos(27r/t) (8) 


Ici = J[cx + (^ - l)5cos(Cc)]^ + [cy + (i - l)5sin(Cc)]^ + h‘^ 

(4) 

Denote the calculated error at the ith step along line 


GcGd is 


— Ir.A Ir.A I 1 dr. 


(5) 


where / is the frequency. To make the audio inaudible 
and to have the frequency supported by commercial 
speaker, we set 17000i^2; < / < 24000i7z. 

On receiving the signal ri(t), there is a phase shift cj) 
compared with 5i(t), such that ri{t) = cos(27r/t + 0). For 
instance in Figure la, the displacement [3] is 


Hence, we obtain the position of G using the following 
equation: 

nc —1 

= argmin ^Ij ( 6 ) 

9x,gy 2G{a,6,c,<i,e} i=l 


d = h - I 2 = - 0i) (9) 

where and 02 is the calculated phase at Oi and O 2 
respectively and Va is the travelling speed of acoustic 
wave. 






























=5*10“^ and k^=0 =5*10“^ and W^=0 




k^=5*10 ^ and k2=10 ^ 



(a) First order, large ki 


(b) First order, small ki 


(c) Second order 


Fig. 4: Calculated displacement by PLL with different orders and parameters, when the signal is weak. 


4.3 Preprocessing Received Signal 

Before tracking phase <p from ri(t), we have to prepro¬ 
cess the received signal. For the sent signal the 

actual received signal rraw{t) does not equal to ri{t). Its 
amplitude A{t) always changes and it is also mixed with 
noises cr(t). We denote rraw{t) = A{t) cos(27r/t + 0(t)) + 
cr(t). Hence, we need to firstly eliminate A{t) and cr(t) 
before tracking the phase 

To eliminate the noise cr(t), we let rraw{t) pass through 
a Band Pass Filter (BPF). The processed signal Vfuter ~ 
A{t) cos{ 27 rft + 0(t)). rfilter is then processed by Au¬ 
tomatic Gain Control (AGC) [16]. After that, A{t) is 
removed and the signal can be seen as ri(t) [3]. 

4.4 Tracking the Phase 

To track phase for inferring displacement, we adopt 
the second-order Phase Locked Loop (PLL) to track the 
phase when the smart device moves, rather than the 
ordinary first-order PLL. PLL is a classical method in 
signal processing and can be regarded as a device that 
tracks the phase and frequency of a sinusoid. In our 
design, it is implemented purely by software due to the 
limited capabilities of smartphone platform. 


Loop Filter 



Direct Digital Synthesizer 


Fig. 5: Design of the Second-Order Phase Locked Loop. 

We show our design of PLL in Figure 5. The PLL 
contains three main components: phase detector, loop 
filter and direct digital synthesizer (DDS). The phase 
detector detects the difference (/)e = 0 — 0 , where 0 is the 
estimation of <f. According to (per the loop filter analyzes 
and predicts the offset 72 +73 of 0 for the next cycle 
of the loop, where the variance of 72,73 is affected by 
parameter ki and k 2 respectively. The DDS updates the 
next (j) by adding the offset and prepares 71 for the next 
phase detection. 


In the process of the phase detector in Figure 5, for the 
nth input ri(nTs), ri 7 i = sin((/) — 0 ) — sin( 47 r/Tsn + 0 + 
(j)). Here we denote Tg as sampling period of received 
signal. As (j)e = LPF(ri 7 i) where LPF is the low pass 
filter, the high frequency component of ri 7 i is eliminated 
and (j)e ~ sin(0 — 0 ). If the phase is locked {i.e., (p is close 
to (j)), sin((/) — 0) ^ 0 — 0. Hence (pe ^ (p — 

In Figure 5, the Loop Filter is the key part of PLL. 
There have been many proposals on design of loop filter 
[1], and the type and parameter of Loop Filter should be 
carefully chosen for different purposes. Here we adopt 
a second-order filter, i.e., the proportional-plus-integrator 
[16] filter, as the Loop filter. It uses two updated variables 
72 , 73 and two constant parameters ki, k2. Particularly, 
if k 2 = 0, it degrades to a first-order PLL. 

We explain why the first-order PLL cannot be used in 
our case. When the phone is static and the PLL becomes 
stable after several cyclic loops, 0 e ~ 0 and (j) is close 
to constant. Hence in Figure 5, 72 ~ 0 and 72 + 73 ~ 0 
which infers 73 « 0 . It means that k 2 can be eliminated 
and the first-order PLL is sufficient. However, if the 
phone moves, and we still use the first-order PLL, the 
performance is good in case of high signal-to-noise ratio 
(SNR) but also limited by SNR. For instance, assume the 
user moves at a constant relative speed and (/> increases 
Ac/) per Tg, i.e., the cyclical time of the loop. When the 
PLL becomes close to stable, ~ A(/) + (j)g where (j)g 
is error caused by random noises. If the SNR is high 
that (j)g < A(j), we can set ki > A(j) to let 0 catch up 
with the variation of <f. However, when increasing the 
value of ki, as the magnitude of 72 increases, (j) becomes 
unstable and tends to be affected by noises. The bad case 
is that the phase, which is actually (j), is intended to be 
locked or already locked to 0 + 27r or 0 — 27r. We call 
this phenomenon the jitter for convenience. The error of 
the corresponding displacement is 2 ^^^ ^ 1 . 8 cm which 
affects the accuracy of position estimation in section 3. 

To show the limitation in the experiment, we let the 
user hold the phone for a while, move the phone forward 
to the speaker and backward for three times, and finally 
stop at the starting point. In Figure 4, we show the result 
of PLL with different parameters, when the acoustic 
signal is weak, i.e., I = 32m. In Figure 4a, the ki is large 
enough to catch up with the real displacement. However, 
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Time (s) Time (s) Time (s) Time (s) 

(a) m(t) of Weak Signal (b) mi{t) of Weak Signal (c) m2{t) of Weak Signal (d) m2{t) of Good Signal 

Fig. 6: Detection of the arrival time of the pulse. 


it is affected by the noises and sometimes cannot lock 
when moving. It results in occasional jitters on the up- 
and-down curve. Then, the calculated displacement from 
the start to the end, which should be close to 0, accumu¬ 
lates to 17cm after total moving length of about 100cm. 
On the contrary in Figure 46 where ki is small, the 
calculated phase displacement cannot catch up with the 
real phase and jitters frequently when moving. Hence, 
there is limitation of using first-order PLL for supporting 
both high-speeding moving and high noises. 

Therefore, for solving the above problem, the updated 
component 73 is added, which turns the first-order PLL 
into the second-order one. 73 can be seen as the phase 
variation Acj) per T^, which corresponds to the relative 
speed from the phone to the speaker. If the PLL becomes 
stable, in each cyclic loop, the loop filter predicts next 
phase with the added 73, which results in (/)e ~ (/)s, 
instead of (pe ~ + 0s- h means that it is no longer 

needed to set large ki to let 0 catch up with the dynamic 
0. Hence, ki can be much smaller that the PLL is more 
robust to the noises. In Figure 4 c, we choose the second- 
order PLL by setting k2 0 . Meanwhile, ki is much 
smaller than the one in Figure 4 a that the PLL is more 
robust to noises and does not cause observable jitters, ki 
equals to the one in Figure 46 , but has no problem of 
catching up with the fast displacement for k2 0 . The 
accumulate displacement error is less than 2cm which 
is about at least 9 times more accurate than the one in 
Figure 4 a. 

5 Positioning by Synchronization 

Though we synthesize all the walking segments when 
user walks and turns, the problem is that the method 
has accumulated errors when we estimate the latter po¬ 
sition by using the previous position, estimated walking 
directions and walking steps. Especially when the user 
is far away and loses the signal from the speaker for a 
long time, the error increases and the historical measured 
position can no longer be used. To solve this problem, we 
propose a synchronization mechanism that we leverage 
historical measurement to improve the robustness of 
WalkieLokie. 

In synchronization, we additionally encode periodical 
pulses S2 (t) in sending signal and propose the demodu¬ 
lating method to detect the receiving time of the pulses. 
Since the pulses are periodical, the sending time of latter 


pulses can be predicted, if we can accurately estimate the 
sending time of one periodical pulse. Hence, by using 
samples which can be directly used to calculate accurate 
position, we obtain the estimated distance, which infers 
traveling time U from the speaker to the phone. Then, we 
detect the receiving time r' of one pulse in these samples 
and get the accurate sending time of the pulse r = r' —ti. 
Furthermore, the sending time of latter ith pulse equals 
to = r + iT, where T is denoted as period of pulses. 
Hence, on obtaining the receiving time of ith pulse t[, 
we finally obtain real-time distance by using and t[ 
instead of the estimation method in Section 3 . 

5.1 Pulse Modulation 

To design synchronization pulses S2 (t) and the detection 
algorithm, several problems should be addressed: 

• Each speaker should not take much acoustic band¬ 
width in order to support more speakers in the 
room. Hence, si{t) and 52(t) should be at the same 
frequency band, otherwise additional bandwidth 
for 52 (t) is needed. Moreover, bandwidth of 52 (t) 
needs to be narrow. However, it is challenging that 
52 (t) should occupy more bandwidth if it can be 
successfully detected. 

• 52 (t) can also be used for displacement tracking by 
PLL. Otherwise, PLL will lose phase locks when 
processing 52 (t). 

Based on these requirements, we design 52 (t): 

cos(27r/t + TT sin j + Tp 

cos(27r/t) otherwise 

( 10 ) 

where we construct pulses starting at ri, ..., r^, and the 
duration of each pulse is Tp. 

More specifically. Figure 7 shows an example of de¬ 
tected pulse when the user moves the phone forward 
and backward twice and then stops. We encode three 
adjacent pulses per T2 = 0 . 255 . Three adjacent pulses 
can be seen as a compensated periodical pulse with 
the period T = T2 = 0 . 255 . The time difference of the 
adjacent pulses is T3 = 0 . 035 . In Figure 7 a, the estimated 
displacement is smooth and have no jitters whenever the 
phone is static or moving. We zoom in the calculated 
phase to show the performance of PLL when there are 
pulses in 52 (t): the calculated phase is not locked to the 


























































































real phase; instead, it seems that PLL has not detected 
the pulses that the phase is very smooth. Specifically, 
while the maximum variation of the real phase is tt, 
the corresponding variation computed by PLL is less 
than 0.4rad, which corresponds to the displacement of 
about 1mm. The cause of the phenomenon is that the 
parameters {ki, k 2 ) of PLL are very small, and does track 
the fast changing phase. Moreover, as the phase at the 
beginning of a pulse equals to the one at the end and 
the variation by PLL is small, the tracked phase finally 
becomes stable and the phase is the same with the one 
at the begining. 

5.1.1 Proof on Properties of Modulated Pulse 

We prove that the pulse S 2 does not take much acoustic 
bandwidth; and has little effects on the result of displace¬ 
ment tracking by PLL. 

First, the central frequency of S 2 is the same as the 
one of si, except that the phase changes when there 
is a pulse. Hence, si and S 2 share the same frequency 
band. Second, since the bandwidth of the pulse is about 
^ [17], we set Tp = O.OOTs so that the bandwidth is 
about 460Hz. As the minimum frequency is 17000Hz 
when the acoustic is non-audible, and the maximum 
frequency which is supported by the phone is 24000Hz, 
the maximum concurrent signals that WalkieLokie sup¬ 
ports in one place is (24000 — 17000)/460 ^ 15. Actually, 
if the pulse has more narrow bandwidth, WalkieLokie 
will support more concurrent signals, whereas the pulse 
becomes harder to be detected. How to modulate sig¬ 
nals with more narrow bandwidth and demodulate the 
signal more accurately is left for future work. Third, 
the component ss (t) = tt sin is the phase shift 

of the sine signal. Furthermore, ss{t) starts and ends 
at the same value 0 , and the maximum value of 53 is 
TT. Hence, the displacement will not be affected by the 
pulse theoretically. 


5.1.2 Discussions of Pulse Modulation 

Choosing Parameters: There is a trade off on choosing 
the parameters T^, Ti, T 2 , T 3 , we show the analysis on 
choosing the parameters as follows: 

a) Tp. As the bandwidth of pulses equals to smaller 
Tp results in wider bandwidth requirement and less 
simulateneous signals in the same room. On the other 
hand, greater Tp results in less accuracy of displacement 
tracking. The reason is that the pulses are regarded as 
noises in displacement tracking. 

b) Ti'. Since there are 3 adjcented pulses in one com¬ 
pensated periodical pulse, Ti = T 2 — 3 T 3 . 

c) T 2 : Recall that T 2 = T which is the period of com¬ 
pensated pulses. Smaller T2 will enhance the accuracy of 
measuring the receiving time of pulses for we have more 
pulses for matching. However, if we choose smaller T 2 , 
we may face the ambiguity problem. Specifically, denote 
the receiving time of a pulse is tr and the sending time 
of periodical pulses is tg + kT 2 . The calculated distance 
is Va{tr — ts — kT 2 ), where k is an undetermined integer 
which also makes the distance undetermined. To get k, 
we further leverage the maxmium distance from speaker 
to anchor, denoted as Im- Since Va{tr — ts — ^^ 2 ) < Im, 
to get the unique solution of distance, VaT2 should be 
greater than Im- In our paper, we assume that Im = 85m 
which infers T 2 = 0.25s. 

d) T 3 : T 3 has limitation on its minimum value. Firstly, 
to avoid overlaps of adjacent pulses, Ts > Tp. Secondly, 
there also should be intervals between adjacent pulses. 
We zoom in Figure 7a and find that PLL needs time 
longer than the duration of pulses to lock the displace¬ 
ment to the real value after the pulse terminates. Hence, 
if adjcent pulses are too close, PLL may become very 
unstable. If T 3 increases, for Ti = T 2 — 3Ts > 0 , T 2 
may also increase which also affects the performance of 
WalkieLokie. 

Reducing Signal Conflicts: As explained earlier, due to 



Time (s) 


(a) Measured Displacement 



(b) Detected Pulses 


Fig. 7: Calculated displacement and pulses from the 
same signal. 
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the bandwidth limitation, our default parameter of pulse 
modulation supports 15 concurrent signals. Here, to re¬ 
duce signal conflicts, we find that further optimizations 
can be made for different applications as follows: 

a) Virtual business card sharing: In this case, users are 
usually close to each other, and we can choose to nar¬ 
row the bandwidth of pulses in synchronization. Hence, 
WalkieLokie can support more users who broadcast sig¬ 
nals simultaneously, while we only reduce the accuracy 
of pulse detection and long distance positioning, which 
are not much required. 

b) Virtual shopping guide: We suggest that if there is 
requirement of more shopping guides, we can use only 
a few speakers for normal indoor localization, instead 
of just relative positioning. Our further evaluations in 
Section 6.4 prove that WalkieLokie supports unlimited 
number of shopping guides by simple and sparse de¬ 
ployment of speakers, z.c., the smart device only receives 
signals from 2 speakers on average, but gains 1 -meter 
accuracy. 

5.2 Pulse Detection 

We discuss how we detect the receiving time r- = 
of the dh pulse by leveraging the component s^{t). 
Assuming the locked phase by PLL is (pr before the 
pulse starts, the expected pulse is r{t) = cos(27r/t -h + 
TTsin Hence, for the received sample r{kTs), we 

compute the likelihood m{kTs) = '^{'i'Ts)r{iTs), 

i.e., when m{kTs) reaches the maximum, the correspond¬ 
ing kTs is the starting time of the received pulse. Note 
that, if we set expected pulse r{t) = cos(27r/t+0^+7r) and 
there is no pulse for the next Tp that r{t) = cos(27r/t+0r), 
m{kTs) = will reach the mini¬ 

mum. Actually, ss{t) is the filtered version of pulse 
ssit) = TT that the pulse ss{t) has narrower bandwidth. 
Accordingly, f(t) ^ f(t) which means m{t) will reach 
the value close to minimum when there is no pulse in 
the next Tp. Hence, arrival time r- of the shape can be 
detected by m{t). 

5.2.1 Analysis on Design of Pulse Detection 

As mentioned earlier, our PLL takes 52 (t) as noises and 
only tracks 5 i(t). There are two advantages based on 
above results: 1 ) the pulses have very small effects on 
the tracked displacement. 2) For the variation is very 
small and the variation of (pr is stable when there are 
pulses, peaks of m{t) become clear to be detected. In 
Figure 7b, m{t) reaches the peak value {i.e., 150), when 
there is a pulse at t and the bottom value {i.e., -50) when 
there are almost no pulses. As a whole, it shows an 
interesting result that on demodulating s{t), the peak of 
m{t) is very clear for synchronization in Figure 7b, while 
the corresponding calculated phase is very smooth for 
displacement tracking in Figure 7a. 

We can also find that when the phone is static, the 
peaks corresponding to the pulses are clear. However, 


they are unclear when the phone is moving. Further¬ 
more, when the signal is weak, the periodical peaks 
cannot be detected by m{t) in Figure 6 a due to noises. 
Hence, we make further solution to make the peaks 
more clear in case that the phone moves or the signal 
is weak. The solution is based on the observation that 
expected peaks still appear at expected time, though 
they sink in the noises. Meanwhile, random peaks have 
fewer chances to appear periodically. Hence, we assign 
mi(t) = m{t— Ts) -\-m{t) -\-m{t-\-Ts) in Figure 6 b, where 
the peaks are more clear to be identified in mi{t). Then, 
we assign m 2 (t) = mi(t—T 2 )+mi(t)+mi(t+T 2 ) in Figure 
6 c, where the peaks can be easily detected. Moreover, 
when the phone is moving and the corresponding phase 
is in Figure 7a, the peaks are also very clear in Figure 
6 d. 

5.2.2 Dealing With Multipath Effects: 

We also find that the result of synchronization is affected 
by multipath effects, especially when the smart device is 
static. Hence, we make further study and improvement 
on pulse detection. 



(a) Good Case. (b) Bad Case (Multipath). 

Fig. 8: Pulse detection in case of multipath effects. 

We find that when the phone is static, there is another 
property which can be leveraged: the distance from the 
smart device to the dummy speaker is constant. Hence, 
we can use m 3 (fcT 2 ) = Eie{x\x=k mod t.} which 

sums all the ms{t) of pulses and make the detected time 
of pulses more clear. The result of summed ms{kT) is 
shown in Figure 8. In Figure 8a when there is no multi- 
path effect, there are 3 pulses in a period T 2 . However, 
in Figure 8b, which is gathered from the shopping mall, 
there are 9 pulses at least, which means there are 2 
additional paths reflected from walls or other objects. In 
this case, all the 3 paths are the possible pulses directly 
received from the dummy speaker. 

After recognizing the possible multipaths, we make 
further step to filter the direct path. Specifically, we 
use the result of PLL, which corresponds to the dis¬ 
placement. As displacement tracking is less affected 
by multipath effects, we compare with the result of 
PLL and pulse detection when a user walks from one 
position and stops at another one. In this case, denote 
that the displacement by PLL is d, and the receiving 
time of pulses are in the set Ta = {tai,ta 2 ,ta 3 , • - } 
and Ti) = ^ 2 , 42 , • • • } at the start point and end 
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point respectively. Hence we obtain the receiving time 
= argmin \{ta - tb)va - d\. 

ta CTo, ,ti, GTf, 

5.3 Positioning after Synchronization 

Assume sending time of the next pulse is which is the 
result of synthesizing and the detected receiving time is 
tr. Then distance I = VaiU — ts) and the distance at the 
horizontal plane is L = For direction estima¬ 

tion, we first calculate x and y using newly obtained 
L, previous 5 and di. For example, on calculating in 
Figure 2, assume h is obtained from synthesizing. Since 
X = —licos'ipi and y = /i sin 1 /^ 1 , /' in Eq. (1) has the 
following form 

/• = l\ sin^ + (-/i cos+ (i - l)^)^ (11) 

Hence, 1/^1 is obtained by arg min where is 

'iPi 

calculated by Eq. (2), (11). Then cosi/^^ = 

6 Performance Evaluation 

In this section, we perform system evaluation by using 
two types of speakers: Samsung Galaxy Note 2 and 
normal dummy speakers. The speaker merely broadcasts 
acoustic waves and does not perform communications. 
We mainly use Google Nexus 4 to receive the acous¬ 
tic signals. We do not make any modifications to the 
phone or jailbreak the operation system, and all the 
components, such as BPF, AGC, PLL, are implemented 
by the software. We evaluate the performance in an 
empty room, an office and the shopping mall. The 
micro benchmarks are made for position estimation and 
synchronization. We then evaluate the total performance 
where all the components are used. 

Note that in our system, we do not measure the 
walking direction in World Coordinate System (WCS). 
The main reason is that this measurement is not neces¬ 
sarily needed in our system and we only calculate the 
relative position between walkers and speaker. Further¬ 
more, accurately measuring walking direction in WCS 
is still challenging [15], which is mainly caused by 
unpredictable errors when using compass, especially in 
indoor environment. Therefore, to evaluate the accuracy 
of our system, we build and rely on relative coordinate 
system (RCS), instead of classical World Coordinate 
System (WCS). In RCS, we set the direction of the 
piecewise linear segment as X axis, and starting point 
of the segment as origin point. For instance, assuming 
Y = ^/y‘^ — and X = -x in Figure 2, {X,Y) is the 
position of the speaker when the user starts walking. 

6.1 Position Estimation 

We evaluate position estimation in several types of cases, 
i.e., different related positions from the phone to the 
speaker, number of walking steps, users, orientation of 
devices, device diversity and environments, which may 
affect accuracy of the estimation. 


6.1.1 Positions 

We make evaluation in an empty room to evaluate 
the performance at different places. In this experiment, 
the speaker is placed at different locations, z.c., X = 
2,4,6,8m, and Y = 2,4,6,8m. We let the user walk for 
9 ^ 10 steps with the walking lengths of about 6m. 
The relative height h is about 0.3m. For each location, 
the user holds the phone in hand and walks for 35 
times to gather samples, i.e., we get 560 samples in 
this micro benchmark. Note that by using other smart 
devices, such as smart glasses, or smart watches, the 
user can have more comfortable experience. Due to that 
we only use IMU sensors and microphone which are 
frequently used in smart devices, we use the smartphone 
as smart device in the experiment. Then, we calculate the 
relative position for evaluating the accuracy of calculated 
distance L = + Y‘^ and direction = arccos(X/I/). 

Note that since the user walks for only several steps 
and the walking distance is short, we only evaluate the 
accuracy of the initial position (X, F). 

In Figure 9a, the accuracy of distance estimation is 
very close for different X. We further study the distri¬ 
bution of large errors in Figure 9b. We find an inter¬ 
esting fact that the errors are nearly proportional to Y. 
Hence, when Y = 2,4,6,8m, the corresponding errors 
are within 0.35m, 0.55m, 0.97m, 1.88m at the percentage 
of 80%. The result is acceptable in our case for the user 
requires higher level of accuracy when s/he is close 
to the speaker. Furthermore, we use synchronization 
and synthesizing scheme achieves the accurate ranging 
in longer distances, instead of this position estimation 
scheme. 

For direction estimation, it is still very accurate when 
the X or F increases in Figure 9c, 9d. As a total, the 
mean of ranging and angle error is 0.63m and 2.46^ 
respectively. 

6.1.2 Number of Steps 

The accuracy of the position estimation depends on 
number of walking steps. We compare the results when 
the user walks for smaller number of steps in Figure 
9e, 9f. The samples are the same ones gathered in section 
6 .1.1, and the only difference is that we only use part 
of each sample which infers fewer walking steps. The 
results show that the ranging errors increase quickly 
when Us reduces. The reasons are: 1) The user's stride 
length varies occasionally. 2) User's phone also shifts left 
and right regularly, z.c., it does not move strictly in a 
line, when the user holds the phone and walks. As these 
facts will have less effects on the accuracy when Ug is 
larger, it can be foreseen that the accuracy will continue 
to be improved when Ug > 10, though it is already very 
accurate when Ug = 10. 

The estimated direction is also affected by the smaller 
Ug in Figure 9f. But it is still acceptable that the angle 
errors are under 8^ at the percentage of 80%, when Ug = 
6 . As a whole, when Ug is small, the accuracy is enough 
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(a) Ranging 


(b) Ranging 


(c) Direction 


(d) Direction 


(e) Ranging 


(f) Direction 


Fig. 9: The accuracy of ranging and direction finding 1) when the user starts walking at different positions 
(a)(b)(c)(d), 2) when the user walks for smaller number of steps (e)(f). 
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(a) Stride Length (b) Ranging (c) Direction (d) Ranging (e) Direction (f) Ranging (g) Direction 

Fig. 10: The mean and standard deviation of ranging and direction estimation for different users (a)(b)(c), 
placements (d)(e), and smart devices (f)(g). 


for direction estimation of surrounding speaker, while 
it requires latter synthesizing scheme to obtain accurate 
distance. 

6.1.3 Users 

Different user has different stride length and user mo¬ 
tion when the user walks, which causes variation of 
displacement patterns displacement di and might affect 
the positioning result. Hence, we recruit 8 volunteers in 
this experiment: each user walks in a line of about 6m 
for 35 times where {X,Y) = (4,4). 

We have the following observations in Figure 10: The 
standard deviations (std) of the ranging and direction are 
small for most users. In Figure 10a, the person 1,2,4,6,7 
have small stride lengths while the rest ones have bigger 
length, but the result is similar for all the users (except 
for the person 6,7). 

The results infer that the stride length is very stable 
and the positioning accuracy is not much affected by 
variation of stride length, though the stride length be¬ 
tween different users may be much different. 

6.1.4 Orientation of Speaker and Microphone 

We consider the cases when the speaker or the mi¬ 
crophone faces to different directions: (1) (default) the 
microphone faces to the sky, and the speaker faces to 
the walking line. (2) microphone, facing to the front. 
(3) microphone, perpendicular to the walking direction 
and facing to the speaker. (4) microphone, facing to the 
ground. (5) microphone, perpendicular to the walking 
direction and speaker is at the back of the microphone. 
(6) speaker, facing to the ground. The result in Figure 
lOd, lOe shows that the std is small in all cases and the 
result is very stable. 

We also find that the mean value of distance in¬ 
creases when the signal is weaker in case (2), (4) and 


decreases when signal is stronger in case (3). The reason 
is that when the signal is weak, PLL will lose some 
signals and the displacement decreases, which makes 
the calculated distance become larger. Hence, based on 
our measurements in displacement tracking, we make 
calibrations on the calculated PLL. More specifically in 
case (1) that the displacement d = 1.22^A<p) if d > 0, 
and d = 1.69 if d < 0, where Acj) is the tracked 

phase shift. Note that we make calibration with constant 
factor {i.e., 1.22), for the environment has limited effect 
on the result of PLL when the signal is strong enough. 
However, when d < 0, which means the speaker is at 
the back of the walking user, d is usually not used for 
position estimation if the tracked phase is abnormal (e.g., 
when WalkieLokie cannot detect pulses from the phase). 

6.1.5 Device Diversity 

We test several Commercial Off-the-Shelf (COTS) smart 
devices as acoustic receivers: (1) Nexus 4, (2) Samsung 
Galaxy Note 2. (3) Nexus 7. We choose {X,Y) = (4,4) 
as the start point of walking, and the error of position 
estimation is shown in Figure lOf, lOg. The result shows 
that these smart devices have similar performance. 

We also use normal dummy speakers as acoustic 
speakers when we make experiment in a large shopping 
mall, for we consider the case that the normal speakers 
serve as virtual shopping guides. 

Calibration of clock drift: We find some interesting 
phenomenon: different from the previous smart devices, 
the normal speaker has serious clock drift and needs to 
be calibrated. For instance, when a speaker is supposed 
to broadcast signal at 19000Hz, the actual received signal 
is 19007Hz. If the frequency drift is O.lHz, the error of 
distance measuring is about 600'^340'^0.1/19000=1.07m, 
when the smart device performs synchronization for 
10 minutes. To solve this problem, our design of PLL 
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measures the precise clock offset when the receiver is 
static for only a few seconds. In this case, 72 in Figure 
5 rapidly converges to a constant value. As 72 equals to 
the phase shift per sampling time T^, the frequency offset 
equals to 2 ^ 7 ^- Hence, once we let the smart device be 
static for a few seconds, the precise frequency offset is 
obtained. Afterward, we calibrate the clock drift in real¬ 
time using the constant frequency offset and there is not 
any clock drift after calibration. 

6.1.6 Environments 

We compare the accuracy of position estimation in the 
empty room and at different locations in the office. We 
find that it shows the similar results. We further evaluate 
the effects in a shopping mall in the latter subsection. 

6.2 Synchronization 

In Figure 11, we choose 8 locations in an empty room 
and the office to evaluate the performance of synchro¬ 
nization. For example, E32 means that the experiment 
is in the room and the distance from the smart device 
to the speaker is 32m, and 016 means that it is in the 
office and the distance is 16m. In each position we test 
two cases: the phone is static or moving back and forth 
without stop. For each case, the phone records the audio 
for 100 seconds, which means there are 400 signals for 
synchronization in the samples. Then, we evaluate the 
accuracy of pulse detection. For easier understanding 
of our results, the error of arrival time tg is converted 
to distance measurement error /g = For instance, 
if the error is the time interval of 1 acoustic sample, 
Z.C., tg = 44 ^ 5 , the corresponding distance error is 
/g « 0 . 8 cm. 



(a) 80cm criterion. (b) Stand Deviation 


Fig. 11: (a) Percentage of successful experiments at 

different locations (b) standard deviation. 

Since we find that there are occasional significant 

errors (> 3m), we first set threshold It = 80cm and 
evaluate ratio of successful detection that /g < If. In 
Figure 11a, the successful detection rate is above 80% for 
most cases when the phone is static. When the phone is 
moving, the performance is good as well if the distance 
is within 24m and 8 m in the empty room and office 
respectively. In some cases the rate is close to 100%. 

There is also an exception that at location E 8 when the 
phone is static, the rate is only 61.0%, while it reaches 
100 % at the same place when the phone is moving. So, 


we conduct the experiment again at the same place, and 
the result is close to the previous one. We suppose it 
is caused by the multipath effects: the phase changes 
according to the mixed signals and becomes stable when 
it is static, which affects the result of pulse matching. 
The reason of high successful rate in case of moving 
phone is that: though it is also affected by multipath, 
the phases of reflected signal at different positions are 
irregular. In other words, the PLL locks the phase of the 
signal directly from the speaker, i.e., the multipath signal 
is regarded as noises by PLL. Hence, the performance is 
better when the phone is moving. We find the location 
E4, E 8 , E16 also have the same phenomenon, which 
validates our hypothesis. Actually, this is a good result 
for WalkieLokie: when the user is walking, the synchro¬ 
nization result is very good and can be directly used for 
synthesizing; when the user is walking, as the successful 
detection rate is above 60%, WalkieLokie collects enough 
samples and then determines the most possible receiving 
time. In Figure 11b, we show the standard deviation of 
results in case of successful detection. The std in most 
cases are around 10cm expect that the std is 30.9cm 
and 49.2cm when the phone is moving at 04 and 016 
respectively. 

6.3 Positioning after Synchronization 

We evaluate the performance of WalkieLokie which uses 
position estimation and synchronization in the following 
steps: 

1) The user walks in a line where the initial coordinate 
of the speaker is (4,4). In this step, we calculate 
the distance through position estimation and then 
calculate the sending time of periodical signals S 2 {t) 
by synchronization. 

2 ) The user then turns, walks and stops at the position 
where relative coordinate of speaker is (A, F). 

3) The user walks again for about 6 m. The position, 
which is supposed to be (A, F), is then computed 
according to the sending time and the received 
samples in this short duration of walking. 



(a) Ranging (b) Direction Estimation 

Fig. 12: Accuracy of positioning by synchronization. 

We conduct the experiment in the empty room and 
the office. Specifically, we set (A, F) = (4, 12 ) and (4,20) 
in the empty room to gather the samples and (4, 8 ) and 
(4,16) in the office. 
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In Figure 12a, the ranging errors are under 0.32m and 
0 .66m at the percentage of 80% and 90% for most cases. It 
means that both position estimation and synchronization 
achieve considerable accuracy. There are also occasional 
errors for each cases which are greater than 2m. It 
is caused by the multipath effects in synchronization. 
Especially for the case of F = 12m in the empty room, 
the big errors are at the percentage of 12%. We can find 
the corresponding results at E8 and E16 shown in Figure 
11 a, where the successful detection rate is also much 
lower than other cases in synchronization. Actually, since 
the successful detection rate in synchronization is above 
80% for most cases, the result would converge to the 
correct value if given enough time and the abnormal 
result would be eliminated. Hence, we conclude that the 
ranging results are very good in these cases. 

6.4 Putting it Aii Together in a Severe Environment 

We evaluate WalkieLokie in a shopping mall, where the 
environment is quite severe for acoustic based systems: 
the shopping mall itself is broadcasting loud audios; 
there are always people walking around who blocks the 
sight line of speakers or blocks the road that we have to 
turn walking direction. Furthermore, as it may affect the 
business if we set up speakers on the ceiling and conduct 
frequent debugging (which may have better results), we 
only put the speakers at the side of the aisles, as shown 
in Figure 13, 14a. Hence, our system has to deal with 
serious NLoS effects. 

35.2m 



H4) 19000Hz ^4) 18000Hz 

Dummy Speaker • Test Point | 1 Walking Area 


Fig. 13: Map of the shopping mall. 

We evaluate the performance of positioning in two 
cases: a) relative positioning by one speaker, b) absolute 
positioning by 5 speakers (like normal indoor localiza¬ 
tion). We choose a 35m x 17m area (about 600m^) in 
Figure 13, and put 5 normal dummy speakersin this 
area. Each speaker broadcasts signals at different central 
frequency, which are inaudible and not discovered by 
surrounding customers. We emulate the behavior of 
normal shopping users in evaluation: the experimenter 
stands at a test point and walks for a few steps (less 
than 6m) in a line; then he stops or turns the direction 
and continues walking, and so on. We gather 8 samples 
per point. Hence, we can evaluate the performance when 
leveraging all the walking segments to get the position. 

We set central frequency of the speakers to 17000Hz, 
18000Hz, 19000Hz, 20000Hz, 21000Hz, respectively. The 


smart device differentiates the signals by using the sub¬ 
component BPF in the Figure lb. For example, if we need 
to analyze the signal of the second speaker (18000Hz), 
we set the frequency band of BPF which filters the signal 
at 18000Hz, and other signals are blocked. 



(a) Shopping mall. (b) Position Errors 

Fig. 14: (a) Shopping mall and the dummy speaker, (b) 
Result of relative positioning (using 1 speaker), and 
absolute positioning (using 5 speakers). 

The results show that these 5 speakers have much 
different performances in relative positioning, though 
they are the same product model. The signal of speaker 
at 17000Hz only covers 13% of the area, but the signal 
of speaker at 19000 and 20000 covers about 54% and 
51% of the area. The reason of this diversity may be 
caused by several facts: anchor positions, quality of 
different anchor speakers, etc. We leave the study on 
configuration of speakers in our future work. Totally, 
the average coverage per speaker is 38%, which is about 
222im? in our specific area. 

We show the relative position errors when using one 
speaker in Figure 14b. Note that we only calculate the 
accuracy of the relative position where the starting point 
is covered by the signal of the speaker. Though we can 
still estimate position according to historical positioning 
result when there is no signal, we exclude the results 
of this case and obtains the direct result. The results 
show that for one speaker, the position errors are under 
1 .2m, 2m at the percentage of 50% and 80%. The mean 
error of relative positioning is 1.28m. 

We also explore the localization capabilities when all 
5 speakers are used as anchors. We evaluate the errors 
at all points and the results show that the position errors 
are under 1.5m at the percentage of 90%. Since the 
average coverage per speaker is 38%, the smart device 
can receive audio from 38% *5^2 speakers on average. 
The accuracy is intuitively better when using multiple 
signals for localization. 

6.5 Overhead 

The computation overhead is mainly caused by 3 com¬ 
ponents: displacement tracking (Including BPF, AGC, 
PLL), pulse detection and position estimation. We run 
WalkieLokie using matlab R2013a on Mac OS, and the 
CPU is 3.1GHz Intel Core i5. For 1 second of received 
samples, phase tracking, pulse detection, and position 
estimation takes 0.09s, 0.12s, 0.05s respectively. In fact, 
there is a trade-off between the overhead and accuracy. 
For example, we can use infinite BPF instead of finite 
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BPF, which reduces the computation overhead signifi¬ 
cantly, but incurs larger errors. For the smart devices, it 
is recommended to send the recorded samples to cloud 
server, and obtains the result from the cloud, which 
requires much less computation overhead, meanwhile 
with low energy consumption. 

7 Related Work 

7.1 Ranging 

There have been many localization systems which are 
based on ranging [2], [7], [8], [13]. They achieve consid¬ 
erable accuracy of ranging, but require special hardwares 
for synchronization purpose. Specifically, the sender 
records sending time of signal which is used for ranging, 
while the receiver detects the arrival time of the signal. 
Each individuals calculate the sending time or arrival 
time independently without referring any time informa¬ 
tion on other devices. Hence, synchronization among 
devices is needed. In Bat System [2], the base-station uses 
radio channel and communications for synchronization. 
Cricket [13] uses special device to send the RF signal 
together with the ultrasound signal at the same time. 
Then the receiver obtains the distance according to the 
different traveling time of the two signals. Guoguo [7] 
uses RF signals to synchronize all the acoustic anchors, 
the location can be obtained according to the differences 
of the receiving time by the phone. BeepBeep [12] cal¬ 
culates the distance between the phones. It solves the 
synchronization problem by letting two phones emit 
acoustic signals and exchange the sending and receiving 
time via wireless channel. 

WalkieLokie uses dummy speaker to implement syn¬ 
chronization and ranging. The synchronization informa¬ 
tion is obtained by a novel position estimation method 
that it does not need any special hardwares or additional 
communication channels. The other difference is that 
these systems are only based on ranging results of 
anchors which requires multiple speakers (> 3), while 
WalkieLokie also implements direction estimation from 
phone to speaker and only one speaker is needed for 
localization. 

7.2 Direction Estimation 

Most methods on direction estimation also require spe¬ 
cialized hardwares, which use the directional antenna 
[5], [10], [18] or the antenna array [4], [18], [19]. For 
example, by rotating the beam of directional antenna, 
a receiver can pinpoint the direction of the AP as the 
direction that provides the highest received strength [18]. 
For the antenna array [4], [18], [19], the receiving time of 
the signal by each antenna is different, and magnitude of 
the difference corresponds to angle of the arrival signal. 

There have been proposals without requirement of 
specialized hardwares as well. [21] emulates the func¬ 
tionality of a directional antenna by rotating the phone 
around the user's body, to locate outdoor APs. [14] 


leverages multiple microphones of the smartphone and 
communication channels for positioning within 4 meters, 
which is used for short-distance positioning and phone- 
to-phone games. Some other methods leverage Doppler 
effects by swinging [11] or shaking [3] the phone. [20] 
calculates direction by head nodding or shaking using 
smart glasses. They are based on different frequency 
shift when the phone are moving at different directions. 
Compares to [3], [11], WalkieLokie makes further steps 
that a user can obtain direction without any additional 
actions on the phone so that s/he can get the real-time 
direction while walking. Furthermore, [3] requires only 
the speakers as anchors as well, but does not address the 
ranging problem, while WalkieLokie can compute both 
the direction and distance from the phone to the speaker. 

8 Conclusion 

We propose and implement WalkieLokie, a localization 
scheme that calculates the relative position from a smart 
device to a dummy speaker. The dummy speaker only 
needs to emit acoustic signals at non-audible frequency, 
so that COTS speakers can serve as anchors. Further¬ 
more, WalkieLokie directly obtains both distance and 
direction from smart device to speaker, which is quite 
different from existing localization systems that are ca¬ 
pable of obtaining only the distance or the direction. As a 
result, WalkieLokie only requires one anchor for localiza¬ 
tion, while others need multiple anchors for calculating 
the final position, such as trilateration. By pushing the 
limit of the anchor's number, WalkieLokie is not only 
capable of indoor localization, but also has a potential for 
wider applications, such as augmented-reality or mobile 
social applications. 
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